Ruth King, Byron Morgan, Steve Brooks (our workshops and book).
Richard McElreath ( book and lecture videos).
Jim Albert and Jingchen Hu ( book).
November 2021
Ruth King, Byron Morgan, Steve Brooks (our workshops and book).
Richard McElreath ( book and lecture videos).
Jim Albert and Jingchen Hu ( book).
R with the Jags software\(\Pr(A \mid B)\): Probability of A given B
The ordering matters: \(\Pr(A \mid B)\) is not the same as \(\Pr(B \mid A)\).
\(\Pr(A \mid B) = \displaystyle{\frac{\Pr(A \text{ and } B)}{\Pr(B)}}\)
The chance of the test being positive given you are a vampire is \(\Pr(+|\text{vampire}) = 0.90\) (sensitivity).
The chance of a negative test given you are mortal is \(\Pr(-|\text{mortal}) = 0.95\) (specificity).
From the perspective of the test: Given a person is a vampire, what is the probability that the test is positive? \(\Pr(+|\text{vampire}) = 0.90\).
From the perspective of a person: Given that the test is positive, what is the probability that this person is a vampire? \(\Pr(\text{vampire}|+) = \; ?\)
Assume that vampires are rare, and represent only \(0.1\%\) of the population. This means that \(\Pr(\text{vampire}) = 0.001\).
\(\Pr(\text{vampire}|+) = \displaystyle{\frac{\Pr(\text{vampire and } +)}{\Pr(+)}}\)
\(\Pr(\text{vampire and } +) = \Pr(\text{vampire}) \; \Pr(+ | \text{vampire}) = 0.0009\)
\(\Pr(+) = 0.0009 + 0.04995 = 0.05085\)
\(\Pr(\text{vampire}|+) = 0.0009/0.05085 = 0.02\)
\[\Pr(\text{vampire}|+)= \displaystyle{\frac{ \Pr(+|\text{vampire}) \; \Pr(\text{vampire})}{\Pr(+)}}\]
A theorem about conditional probabilities.
\(\Pr(B \mid A) = \displaystyle{\frac{ \Pr(A \mid B) \; \Pr(B)}{\Pr(A)}}\)
\[ \Pr(\text{hypothesis} \mid \text{data}) = \frac{ \Pr(\text{data} \mid \text{hypothesis}) \; \Pr(\text{hypothesis})}{\Pr(\text{data})} \]
The “hypothesis” is typically something unobserved or unknown. It’s what you want to learn about using the data.
For regression models, the “hypothesis” is a parameter (intercept, slopes or error terms).
Bayes theorem tells you the probability of the hypothesis given the data.
How plausible is some hypothesis given the data?
\[ \Pr(\text{hypothesis} \mid \text{data}) = \frac{ \Pr(\text{data} \mid \text{hypothesis}) \; \Pr(\text{hypothesis})}{\Pr(\text{data})} \]
Due to practical problems of implementing the Bayesian approach, and some wars of male statisticians’s egos, little advance was made for over two centuries.
Recent advances in computational power coupled with the development of new methodology have led to a great increase in the application of Bayesian methods within the last two decades.
Typical stats problems involve estimating parameter \(\theta\) with available data.
The frequentist approach (maximum likelihood estimation – MLE) assumes that the parameters are fixed, but have unknown values to be estimated.
Classical estimates generally provide a point estimate of the parameter of interest.
The Bayesian approach assumes that the parameters are not fixed but have some fixed unknown distribution - a distribution for the parameter.
The approach is based upon the idea that the experimenter begins with some prior beliefs about the system.
And then updates these beliefs on the basis of observed data.
This updating procedure is based upon the Bayes’ Theorem:
\[\Pr(A \mid B) = \frac{\Pr(B \mid A) \; \Pr(A)}{\Pr(B)}\]
Schematically if \(A = \theta\) and \(B = \text{data}\), then
The Bayes’ theorem
\[\Pr(A \mid B) = \frac{\Pr(B \mid A) \; \Pr(A)}{\Pr(B)}\]
\[\Pr(\theta \mid \text{data}) = \frac{\Pr(\text{data} \mid \theta) \; \Pr(\theta)}{\Pr(\text{data})}\]
\[{\color{red}{\Pr(\theta \mid \text{data})}} = \frac{\color{blue}{\Pr(\text{data} \mid \theta)} \; \color{green}{\Pr(\theta)}}{\color{orange}{\Pr(\text{data})}}\]
: Represents what you know after having seen the data. The basis for inference, a distribution, possibly multivariate if more than one parameter (\(\theta\)).
: We know that quantity, same as in the MLE approach.
: Represents what you know before seeing the data. The source of much discussion about the Bayesian approach.
\(\color{orange}{\Pr(\text{data}) = \int \Pr(\text{data} \mid \theta) \;\Pr(\theta) d\theta }\): Possibly high-dimensional integral, difficult if not impossible to calculate. This is one of the reasons why we need simulation (MCMC) methods - more soon.